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Neuropsychiatric disorders such as schizophrenia, bipolar disorder and Alzheimer's disease are major public 
health problems. However, despite decades of research, we currently have no validated prognostic or diagnostic 
tests that can be applied at an individual patient level. Many neuropsychiatric diseases are due to a combination 
of alterations that occur in a human brain rather than the result of localized lesions. While there is hope that 
newer imaging technologies such as functional and anatomic connectivity MR] or molecular imaging may offer 
breakthroughs, the single biomarkers that are discovered using these datasets are limited by their inability to 
capture the heterogeneity and complexity of most multifactorial brain disorders. Recently, complex biomarkers 
have been explored to address this limitation using neuroimaging data. In this manuscript we consider the nature 
of complex biomarkers being investigated in the recent literature and present techniques to find such biomarkers 
that have been developed in related areas of data mining, statistics, machine learning and bioinformatics. 

© 2013 The Authors. Published by Elsevier Inc. All rights reserved. 
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1. Introduction 

Public health consequences of neurological and mental disorders, 
such as Alzheimer's disease (AD), bipolar disorder, or schizophrenia 
(SZ) are enormous. Yet, critical needs for reliable biomarkers for early 
detection and prognostic prediction of such disorders are still unmet 
(Kubicki et al., 2007; MacDonald and Schulz, 2009; Pettersson-Yeo 
et al., 201 1 ). The purpose of this article is to review different data min- 
ing, machine learning, and statistical techniques that can help unearth 
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neurological-disease relevant biomarkers using data from imaging 
studies. Since, the neuroimaging community has also been contributing 
to the development of informatics tools for biomarker discovery, the 
techniques reviewed include those that the neuroimaging community 
already uses as well. All these techniques could further improve the 
complex biomarker discovery process with eventual use in clinical 
setting. 

Neuroimaging technologies such as volumetric MR1, functional MRI 
(fMRI) and diffusion tensor imaging (DTI) are in wide use to indirectly es- 
timate altered cortical tissue, functional and physical connections in neu- 
ropsychiatric disease states (Honey et al., 2009; Park et al., 2008). 
Volumetric MRI measures the cortical thickness of a region, whereas 
fMRI and DTI allow one to construct a brain network for a subject 
where each defined region (e.g. dorsolateral prefrontal cortex, CA1 region 
in hippocampal) in the brain is termed as a "node" and a functional/ 
physical connection (e.g. frontal-hippocampal connectivity) is termed 
as an "edge" (Bullmore and Bassett, 201 1 ; Sporns, 201 1 ). The volumetric 
features or the edges measured from fMRI or DTI, referred to as 'features' 
henceforth, provide an opportunity to study the altered properties 
underlying neuropsychiatric diseases. These features can be binary 
(representing a healthy volume of a region, or the presence or absence 
of a connection) or weighted (indicating volume or strength of a connec- 
tion). Features of the brain measured from multiple subjects are then 
used to predict a phenotype of interest (Ragland et al., 2007). Phenotypes 
can be symptoms such as cognition, depression or mania, or a disease 
diagnosis such as SZ (Drevets and Todd, 2005). A set of features that 
show different properties in different subgroups of the phenotype is re- 
ferred to as a "biomarker" in the rest of this paper. In the case of a binary 
biomarker, the set of features could be (mostly) present in subjects of one 
group and not present in the subjects of the other group, and in the case 
of a continuous biomarker they could have high values in one group and 
low values in the other group. 

Research in neuroimaging data has focused on exploring the hy- 
pothesis that mental disorders manifest due to the loss of cortical tissue 
or altered connectivity in the brain, i.e., reduction in the temporal lobe 
volume, aberrant connectivity within the default network or attention 
network that in turn disrupts cognitive functions (Fornito and 
Harrison, 2012; Stephan et al., 2009). A vast majority of these studies 
(Jafri et al., 2008; Li et al., 2010; Liang et al., 2006; Luck et al., 2011) 
focus on discovering the features that individually show a different de- 
gree of volume or neural connectivity in disease subjects when com- 
pared with healthy subjects. 

While insightful, this direction of research has not yet yielded any 
conclusive causal factors for major mental disorders. This is likely due 
to several well-known challenges. First, the large number of individual 
factors, such as thousands of edges, makes it difficult to find statistically 
significant single markers without sufficiently large study samples. In 
particular, multiple hypothesis testing resulting from the enormous 
number of potential hypotheses increases the chances of statistical er- 
rors, i.e., mistaking spurious patterns for real ones. Second, the complex- 
ity of the diseases being considered makes it unlikely that meaningful 
predictive patterns can be found by only looking at individual factors 
and largely ignoring their interrelations. Third, many diseases are het- 
erogeneous by nature, i.e., patients with a particular disease may form 
different subgroups, and biomarkers appropriate for one subgroup 
may not apply to another. Given the inability of many commonly used 
analytic techniques to handle these challenges (statistical significance, 
disease complexity, and disease heterogeneity), it is no surprise that 
even when statistically significant biomarkers are found by one group 
in one study, they are rarely reproduced in follow-up studies by other 
groups or sometimes by even the same group (Kubicki et al., 2007; 
Pettersson-Yeo et al., 201 1 ). 

Research in biomarker discovery from neuroimaging data is at a cru- 
cial juncture where the field is beginning to acknowledge the need for 
complex multivariate analysis based techniques instead of currently 
used univariate analysis to capture the complex mechanisms underlying 



disease. Existing clinical studies demonstrate that there is an increase in 
predictive power for models built using a combination of imaging fea- 
tures when compared to that of single (Bressler and Menon, 2010; 
Westman et al., 2013; Wolz et al., 2011). Existing studies also show 
that although SZ is widely treated as a single phenotype, there exist 
two different subgroups of subjects (those with good outcome and 
those with poor outcome) that exhibit different structural properties 
in the brain (Mitelman et al., 2003; Nenadic et al., 2012). For example, 
subjects with poor outcome had significantly smaller temporal and oc- 
cipital lobe gray matter volumes (Mitelman et al., 2003). These observa- 
tions in early clinical studies (Bressler and Menon, 2010; Westman et al., 
2013; Wolz et al, 2011) show the need to design computational 
methods that can be used to mine complex biomarkers. 

There are several ways of defining complex biomarkers that are rel- 
evant to a neuropsychiatric disease. For example, a simultaneous reduc- 
tion in volumes of multiple regions, or loss of a set of edges (e.g., left 
frontal-hippocampal connectivity plus right frontal-hippocampal con- 
nectivity) could result in a disease, even though a reduction in volume 
of one region or a loss of one edge (e.g., left frontal-hippocampal con- 
nectivity alone) does not result in a disease. In fact, a few recent studies 
including Westman et al. (2013) and Wolz et al. (2011)) have shown 
that models built using a combination of features result in more predic- 
tive power than univariate approaches. In contrast, it is possible that an 
fMRI study of a disease might find hundreds of edges altered when com- 
pared to controls, of which only the loss of a specific subset of edges 
might cause changes to the functional network structure that result in 
disease. Likewise, it is also possible that only the loss of a set of edges 
that belong to a specific functional group (e.g., "executive network") 
may result in loss of executive functioning in a disease such as SZ or ge- 
riatric depression. We refer to the edge sets belonging to a functional 
group as brain 'pathways'. Exploring such complex types of alterations 
in biomarker data could potentially improve the reproducibility and sta- 
tistical power of imaging studies. This paper presents a set of techniques 
that attempt to identify complex biomarkers that may manifest in any 
of the above scenarios. 

We define four types of complex biomarkers (and analytic tech- 
niques) based on different interesting combinations discussed above: 
(i) Linear biomarkers, (ii) Combinatorial biomarkers, (iii) Pathway bio- 
markers, and (iv) Network biomarkers. Several techniques developed in 
data mining, machine learning, and genomic data analysis communities 
can be helpful in discovering these four different types of biomarkers by 
overcoming some of the known challenges. A similar classification 
scheme for biomarkers has been used in genomic studies (Ayers and 
Cordell, 2010; Chuang et al., 2007; Fang et al., 2012a; Holden et al., 
2008). A number of existing studies have analyzed neuroimaging data 
sets obtained from multiple technologies such as fMRI, DTI, and PET 
data collected on the same set of subjects to study group differences. 
However, in this manuscript we focus on only those studies that analyze 
one type of neuroimaging data set. 

2. Linear biomarkers 

Given a dataset of features (structural information, edges in anatom- 
ic or functional networks) obtained from the brain of several subjects 
and a continuous valued phenotype of interest for these subjects (e.g., 
cognition, psychosis ratings), a linear biomarker is a weighted sum of 
the features that is predictive of the phenotype. The computational 
problem here is the estimation of the weights such that the weighted 
sum is most predictive of the phenotype. A traditional approach to esti- 
mate these weights is to use a linear regression model (Friedman et al., 
2001 ), where the features for a set of subjects are represented as matrix 
X, whose rows are subjects and columns are features obtained from 
neuroimaging techniques, as shown in Fig. 1 (a). The phenotype is repre- 
sented as a column vector, Y whose rows are subjects. The linear regres- 
sion model then estimates a vector [i, such that Y = X[i + e, where s 
accounts for the error. The heart of the model Y = XPj is depicted in 
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Fig. 1. Illustration of linear biomarker discovery: (a) matrix representation by treating edges in the brain as features, (b) linear regression setup where X represents the features (edges in 
brain networks or volumetric information) for all subjects, (i represents the weights for features, and Y represents the phenotype value for each subject, and (c) resultant [J from linear 
regression and LASSO. 



Fig. 1 (b). This model is solved such that the sum of the squared error is 
minimized, i.e., [i assigns a weight to each feature in the dataset in such 
a way that the weighted sum of all features (Xfi) could approximate the 
phenotype (Y), with a minimal error (e). The advantage of using linear 
regression based models lies in the availability of well-documented 
standard software. Using linear regression, Kubicki et al. (201 1 ) showed 
that the gray matter volumes of Superior Temporal Gyrus and Inferior 
Frontal Gyrus, and the functional and anatomical connections between 
them were predictive of hallucinations in SZ. Note that individual corre- 
lations between each features and the phenotype did not yield any sig- 
nificant associations. 

However there are several challenges that arise when linear regres- 
sion is applied to neuroimaging datasets. One challenge is high dimen- 
sionality, i.e., the large number of features in the brain, e.g., a large 
number of edges as a result of the large number of nodes (voxels) in 
the brain that are of the order of 100,000. This leads to a computational 
challenge for a traditional linear regression scheme. Another challenge 
is that only a few features (e.g., a few functional edges in the brain out 
of the billions of edges) are expected to be associated with a given phe- 
notype. A traditional linear regression model generally assigns weights 
over all the features in an effort to find the best association plausible. A 
typical weight assignment is similar to the one shown in Fig. 1(c). 
Although one could potentially select the features that are weighted 
highly by the model as relevant features for a given phenotype, it is 
often unclear what the right number of features is, and therefore such 
an approach could result in an erroneous discovery of associated 
features. 

A variant of linear regression called LASSO (Least Absolute Shrinkage 
and Selection Operator) developed in the machine learning and statis- 
tics domains can address the two key challenges that arise for linear bio- 
marker discovery (Friedman et al., 2001 ; Wu and Lange, 2008). LASSO 
introduces a penalty on [i (in addition to the sum of squares of the ele- 
ments) such that the absolute sum of all elements of [i is small. When 
this model is used to estimate [i using matrix X and the vector Y, it re- 
sults in a [i vector where most components are 0's and any non-zero 
element could be indicative of an edge relevant to the phenotype. 
Fig. 1(c) shows a typical vector that results from a LASSO type model, 
where only a few values in p 0 /i are nonzero. This model allows for auto- 
matic selection of relevant edges without having to choose a parameter 



for the number of features as in the case of a general linear regression 
model. Efficient approaches are available to handle the high dimension- 
al nature of the datasets. LASSO type models were also found to be 
promising in genomic case-control data analysis, where there are tens 
to hundreds of samples and up to hundreds of thousands of genomic 
features like SNPs and gene expression (Ayers and Cordell, 2010; Beck 
et al., 201 1 ; Ghosh and Chinnaiyan, 2005). 

Linear biomarkers approaches have shown promise in discovering 
imaging features that could explain group differences in ADHD 
(Bohland et al., 2012), AD (Liu et al., 2012) and neuro-cognitive deficits 
(Bunea et al., 2011). Recently, Bohland et al. (2012)used a LASSO type 
approach to select relevant features from anatomical and functional 
network measures in combination with non-neuroimaging features to 
predict Attention deficit hyperactivity disorder (ADHD) in individual 
subjects among a group of mixed disease and controls. They noticed 
that the features selected from all three modalities resulted in the best 
performance on a test set. Liu et al. (2012) used a LASSO model with 
spatial constraints to find the set of imaging features (Tl -weighted 
baseline MR brain images) that show increased prediction accuracy be- 
tween AD and mild cognitive impairment. Bunea et al. (201 1 ) demon- 
strated the use of penalized least squares regression approaches to 
predict neuro-cognitive deficits using a dataset that comprises of DTI 
and brain volumetric measures from HIV infected subjects. Logistic 
and linear regression models have been previously implemented in 
many statistical packages such as R, SAS, and Matlab and are easily avail- 
able for use by the scientific community for analysis. 

3. Combinatorial biomarkers 

A key assumption underlying linear regression based techniques 
presented in the previous section is that the discovered biomarkers 
are valid across all the subjects in the study (i.e., disease is homoge- 
nous). However, this assumption does not always hold true, due to dis- 
ease and population heterogeneity. Different subsets of patients tend to 
have different factors that drive the phenotype of interest. For example, 
about 50% of patients with Mild Cognitive Impairment (MCI) are amy- 
loid scan positive but the other 50% are not, thus suggesting that MCI 
is a greatly heterogeneous condition. In this section, we focus on bio- 
markers that can capture the "subspace" scenarios. In particular, we 
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focus on techniques from the data mining area of association analysis 
(Agrawal and Srikant, 1994; Han et aL, 2007; Pang-Ning et al., 2006), 
which has well developed approaches for finding patterns (biomarkers) 
in data sets with binary features and binary outcomes (e.g., phenotype 
or disease label). 

Given a dataset of neuroimaging features such as volumetric infor- 
mation or functional connections (edges), this information can be trans- 
lated into a set of binary features, where each feature records the 
presence of a characteristic of interest with a 1, e.g., a volume being 
high or low, or an area being active or inactive. Presence of a feature is 
indicated by 1. A phenotype of interest (SZ and healthy) is also repre- 
sented as a binary variable, typically with a 1 indicating presence of a 
disease and a 0 indicating absence (a control). A combinatorial biomark- 
er is a subset of features that are present mostly in one group of subjects. 
Note that the combinatorial biomarker is only relevant for those sub- 
jects in which the subset of features are all present. Consider the exam- 
ple shown in Fig. 2(a), where a dataset X whose rows are subjects, 
columns are features, with values 1 (shown in black) are indicative of 
the presence of the features, while values 0 (shown in white) are indic- 
ative of a features absence. The grouping of subjects is represented by a 
column vector Y, where black color indicates SZ and white color indi- 
cates healthy. The submatrix A in X represents two features that are 
all present in a subset of four subjects and they belong to the SZ 
group. Therefore, A is a combinatorial biomarker that is associated 
with SZ. Note that there can be many combinatorial biomarkers in a 
given dataset. In this example, submatrix B is associated with SZ and 
submatrices C and D are associated with healthy subjects. One could 
argue that the features could be discovered by individual testing and 
then grouped together to recover the submatrices A, B, C, and D. 



However, there could be scenarios where the individual features them- 
selves are not informative about the phenotype but together they have 
more information about the phenotype. Individually, the columns 
representing submatrix A in Fig. 2(b) are equally frequent in healthy 
and SZ groups, however the columns in A together are present only in 
the SZ group. Therefore, such biomarkers cannot be discovered using 
traditional linear regression type techniques or by univariate testing. 

Combinatorial biomarkers are substantially different from linear bio- 
markers in that each combinatorial biomarker potentially explains a 
subset of subjects, whereas a linear biomarker is expected to cover all 
the subjects in the study. This gives combinatorial biomarkers more 
flexibility to capture the heterogeneous nature of the subjects and 
their associated signals in the data. For example, submatrices A and B 
cover different subsets of subjects that have the phenotype in 
Fig. 2(a). This strength however leads to additional challenges: compu- 
tational complexity and statistical significance assessment. Approaches 
for discovering combinatorial biomarkers have to explore the space of 
all possible combinations of edges in the brain to discover these bio- 
markers exhaustively (Fang et al., 2012b). For a set of n edges the num- 
ber of all possible combinations is of the order 2" — 1 . This further leads 
to an additional challenge of statistical significance due to multiple hy- 
pothesis testing. When 2" — 1 hypotheses are tested, there is a much 
bigger chance for some of them to turn out to be true just by chance. 
Therefore, the statistical significance values have to be adjusted to ac- 
count for this occurrence. 

Efficient approaches to discover combinatorial biomarkers, referred 
to as pattern mining techniques, have been developed in the field of 
data mining in the last decade (Agrawal and Srikant, 1994; Han et al., 
2007; Pang-Ning et al., 2006). These approaches were first designed to 




Fig. 2. Illustration of combinatorial biomarker discovery: (a and b) X is a hypothetical data matrix where columns represent features derived from neuroimaging data and rows represent 
subjects. The subjects belong to two groups Healthy and schizophrenia (SZ) as indicated by the column vector Y. In matrix X, an element (row, column) with black color indicates that the 
feature is present for a given subject. A, B, C, and D are interesting submatrices in X that have information about Y. The columns representing these submatrices in (a) are individually 
associated with Y, but those in (b) are not associated, (c) Efficient search space pruning: The Apriori principles allows pruning of supersets when a set is not interesting. 
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discover the combinations of items that are purchased together in large 
market basket datasets where each record (transaction) has a list of 
items that are purchased by one customer (Agrawal and Srikant, 
1994; Agrawal et al., 1993). The pattern mining techniques draw their 
efficiency from the anti-monotonicity property which guarantees that 
if a combination of items is not frequently purchased together then a 
combination that includes these items is not frequently purchased too 
(Agrawal and Srikant, 1994). This property is also referred to as the 
"Apriori principle." Fig. 2(c) shows the set of all possible combinations 
of items (A, B, C, D) (Xiong et al., 2006) in the form of a lattice, where 
each node represents one combination of items. Frequent pattern min- 
ing techniques typically search this lattice depicting all possible combi- 
nations. Once a combination is found to be infrequent then all 
combinations that are extensions of the infrequent combinations are 
not enumerated. In Fig. 2(c), when the combination AB is found to be in- 
frequent all its supersets are excluded from being enumerated and test- 
ed. Since the early 1990's, several efficient algorithms to explore the 
search space have been developed (Coatney and Parthasarathy, 2005; 
Han et al., 2000; Zaki, 2000; Zaki and Hsiao, 1999). Some of these algo- 
rithms have also been found to be promising in bioinformatics problems 
involving gene expression and protein interaction network datasets 
(Atluri et al., 2000; Atluri et al., 2009; Bellay et al., 2011; Gupta et al., 
2011; Pandeyet al.,2009). 

Pattern mining techniques for discovering combinatorial biomarkers 
have been proposed for gene expression datasets (Fang et al., 2012a, 
201 2b), where the goal is to find combinations of genes that are all high- 
ly expressed in subjects with cancer and not expressed together in 
healthy subjects. These techniques have shown promise in discovering 
biomarkers from genomic lung cancer data sets. These techniques 
have also been extended to work with continuous valued gene expres- 
sion datasets (Fang et al., 2010). 

The challenge of statistical power posed by the large search space of 
combinatorial biomarkers can be overcome by providing a False Discov- 
ery Rate (FDR) and retaining only those biomarkers that are robust to 
multiple hypothesis testing. An approach to compute FDR for bio- 
markers is to first use randomized datasets to discover combinatorial 
biomarkers and then to compare the association strength of real bio- 
markers with those discovered from a randomized dataset. Note that 
this approach takes into account the multiple hypothesis testing as the 
combinatorial biomarkers are discovered from real and randomized 
datasets by exploring an exponentially large search space. 

BENCH (Biclustering-driven ENsemble of Classifiers), developed by 
Padmanabhan et al. (2012), is another combinatorial biomarker discov- 
ery approach specifically designed for highly underdetermined prob- 
lems (i.e., the number of features is much higher than the number of 
subjects/patients). The method is specifically tailored for the cases 
that exhibit different discriminatory signatures between subgroups of 
samples without any prior knowledge about subgroupings. These com- 
binatorial techniques would represent a novel approach to discovery of 
large-scale connectivity biomarkers in neuroimaging data. Because 
these approaches were primarily designed for binarized data, potential 
loss of information has dissuaded its use in clinical studies. 

To the best of our knowledge, combinatorial biomarker based ap- 
proaches have not been used in neuroimaging literature. One reason 
for this is the lack of strategies to transform continuous features 
obtained from neuroimaging technologies to binary features that most 
combinatorial techniques work with. This gap needs to be addressed be- 
fore new studies could reap the benefits of these approaches to explore 
combinations of features effectively as well as their ability to discover 
subgroups in disease subjects. 

4. Pathway biomarkers 

The functionality of the brain is known to be a coordinated effort of 
multiple regions. For example, the brain processes sensory information 
with the help of a salience network that encompasses functional 



connections between bilateral insula and anterior cingulate cortex. 
Some known brain subnetworks are: (i) default mode network (DMN), 
(ii) salience network (SN), and (iii) central executive network (CEN) 
(Lee et al., 2012). Exploring the association of these brain subnetworks 
with disease will enable researchers to study the relationship between 
these subnetworks and their role in mental disorders. Motivated by the 
progress of finding associations between known biological pathways 
and common complex diseases in genomic data analysis, we refer to 
these type of biomarkers as 'pathway' biomarkers (Holden et al., 2008; 
Medina et al., 2009; Subramanian et al., 2005; Vandin et al., 2011; 
Wang et al., 2009; Zhang et al., 201 0). The fact that such biomarkers con- 
form to existing knowledge allows investigators to interpret their role in 
disease. In fact, a few neuroimaging studies (Calhoun et al., 2008; Kim 
et al., 2010; Ongiir et al., 2010; Palaniyappan et al., 2010; Sun et al., 
2009; White et al., 2010; Woodward et al., 2011) have investigated the 
association of known subsystems with a disease and found promising re- 
sults. For example, Woodward et al. (201 1 ) found association of func- 
tional connections within the DMN and CEN network with SZ. 

The most common connectivity biomarker tested in AD is the DMN, 
which has shown direct correlations with edges of the network and 
cognition (Hedden et al., 2009). There have been multiple attempts to 
use measures of DMN function as a biomarker for early diagnosis (e.g., 
Greicius etal. (Greicius, 2008)); however, intersubject variability is cur- 
rently too high for use in individual subjects. Most studies in this cate- 
gory typically choose a subnetwork of interest and investigate its 
association with the disease; this may result in spurious findings, as 
the subnetworks not considered in the study could be more relevant 
to AD. Therefore, these subnetworks should be studied in comparison 
with each other and not in isolation. As such, data-driven approaches 
to exploring multiple functional networks, such as the twelve resting 
state networks identified by Greicius (2008), have the potential for en- 
hanced accuracy. 

One simplistic approach to discover associations of brain pathways 
with a phenotype is to compute the significance score of association 
for every edge in a brain network and then test the statistical signifi- 
cance of association of a brain pathway based on the scores of its constit- 
uent edges. This framework is shown in Fig. 3, where each edge in the 
brain network is referred to as a feature, and groups of features that 
are in known brain pathways are referred to as functional groups. The 
significance of association for a brain pathway, generally referred to as 
enrichment score, is obtained by comparing the distribution of associa- 
tion scores of its constituent edges with that of association scores from 
random selection of edges. A related approach to discover brain path- 
way associations is to first rank all edges based on their association 
with the phenotype and test each brain pathway if their constituent 
edges are all at the highly associated end of the ranking. Permutation 
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Fig. 3. Illustration of a 'pathway' based biomarker discovery approach. The features (often 
edges in the brain networks) are evaluated individually and then the functional groups 
(resting state networks) are evaluated for enrichment with highly significant features 
(edges). 
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based testing can also be used to quantify the significance of the brain 
pathway associations. A similar approach has been pursued in genetic 
association studies, referred to as Gene Set Enrichment Analysis 
(GSEA) (Subramanian et al., 2005). Several extensions of GSEA and 
other related approaches have been proposed in the literature 
(Holden et al., 2008; Medina et al., 2009; Wang et al., 2009). (A good 
survey of these approaches is available in Wang et al. (2010).) These 
variations include the choice of scoring a genomic feature for association 
(e.g., p-values from t-test or chi-squared test), determining the enrich- 
ment score for a pathway as the minimum of p-values of the features 
contained in the pathway (Medina et al., 2009) and choice in the ap- 
proach for estimating the statistical significance for pathway enrich- 
ment (e.g., phenotype based permutation of feature set permutation). 
Note that the success of this class of approaches is limited by the 
strength of association of individual features. 

Variants of linear and combinatorial biomarkers have the potential 
to address this limitation. A variant of LASSO, group-LASSO, can select 
a set of the edges in the dataset that are known to be part of brain 
subnetwork and are associated with the phenotype. Group-LASSO tech- 
niques select all or none of the edges from a given group when they es- 
timate p. This approach generally discovers the best brain subnetwork 
that is associated with the phenotype in question. Moreover, it has the 
potential to discover combinations that can be formed by features that 
may not be individually associated with the phenotype in question. Dis- 
criminative pattern mining techniques can also be used to discover 
pathway biomarkers by constraining the search space of the patterns 
to only those that fall under known brain pathways. This reduces the 
computational complexity of the pattern mining technique and can 
also improve the statistical significance as the number of hypotheses 
generated is restricted to those groups of edges that fall within known 
pathways. 

The DENSE (Dense and ENriched Subgraph Enumeration) method 
developed by Hendrix et al. (201 1 ) is a fast and theoretically guaranteed 
method that could take in as input a prior knowledge defined as a set of 
query nodes from a brain network and enumerate all the dense subnet- 
works in the brain network that contain user-defined percentage of the 
query nodes. While this method may not be directly applicable to iden- 
tifying biomarkers common to a group of subjects as it works on one 
network at a time, it is, however, very useful to refine biomarkers iden- 
tified. If the nodes of a brain pathway can be provided as input, then a 
particular subject's network could be analyzed to identify some of the 
peripheral nodes and edges that are associated with the biomarker 
that can offer more information about the subject under analysis. 

There are several clinical applications of pathway biomarker type 
approaches in the context of investigating markers for SZ (Mamah 
et al., 2013; Orliac et al., 2013; Tu et al., 2013) and bipolar disorder 
(Mamah et al., 2013). Mamah et al. (2013), studied evaluated the role 
of mean connectivity (obtained from resting state fMRI data) of within 
five known neural subnetworks (default mode, fronto-parietal, cingulo- 
opercular, cerebellar, and salience networks) in SZ and bipolar disorder. 
They found that the decrease in within-connectivity in cingulo- 
opercular subnetwork is large in degree in SZ than in bipolar disorder. 
Orliac et al. (2013) studied the functional connectivity within DMN 
and SN in SZ, while Tu et al. (2013) studied disconnectivity in fronto- 
parietal network in SZ. While these studies show the usefulness of dis- 
covering pathway biomarkers, the methodologies discussed above will 
provide a systematic way to discover them. 

5. Network biomarkers 

Neuroimaging data obtained using fMRI or DTI is naturally represent- 
ed in the form of a network, where nodes are brain regions and edges 
represent connections (physical or functional) (Bullmore and Sporns, 
2009). In this context, we define network biomarkers as features of the 
network that could explain group differences between healthy and dis- 
ease subjects. These features could be topological characteristics of 



nodes, or subnetworks that have significantly different topological prop- 
erties in the two groups. 

Topological properties of brain networks have the potential to offer 
insights into the functionality of the brain (Rubinov and Sporns, 
2010). An extensive number of studies have pursued the goal of study- 
ing how topological properties differentiate in healthy and subjects 
from those with a brain disorder (Camchong et al., 2011; Liu et al., 
2008; Lynall et al., 2010; Rotarska-Jagiela et al., 2010; Stam et al., 
2007). Table 1 presents a representative sample of these studies listing 
different topological properties, including degree of a node, clustering 
coefficient, robustness and efficiency, considered in each of these stud- 
ies. Graph-theoretic approaches have been applied to AD Supekar et al. 
(2008), demonstrating a loss of small-world properties of whole-brain 
networks, and modest correlation with cognitive status. In addition, 
this approach has been used to look at the impact of different lesion pat- 
terns (e.g., diffuse vs. hub-targeted attacks) on global metrics. 

Given the complex nature of mental disorders such as AD and SZ, 
subgraph approaches that focus only on portions of the network may 
yield more accurate correlations with the disease in question. For exam- 
ple, a subnetwork in the brain can show different topological properties 
in healthy and disease groups that cannot be reflected in the individual 
properties of a node or an edge. For this reason, a set of nodes in a net- 
work that exhibits different topological properties in disease and 
healthy groups of subjects can also be treated as a network biomarker. 
One example is a group of nodes that are densely connected in one 
group of subjects compared to the other group (as shown in Fig. 4). An- 
other example is a subset of nodes whose diameter in the induced sub- 
graph is different between the two groups. Yet another example is a 
subset of nodes that play a critical role in the connectivity (in effect 



Table 1 

A selective sample of studies that use network topological properties to explain group differences 
in brain networks. rsfMRl: resting state fMRI data, MEG: Magnetoencephalography, EEG: 
Electroencephalography MM: Magnetic Resonance Imaging. 

Network topological properties: D: degree, CC: clustering coefficient, CPL: characteristic 
path length, LE: local efficiency, GE: global efficiency, H: hubs, M: modularity, SW: small 
worldness, R: robustness, CS: connection strength, CV: connectivity variance, CD: 
connection distance, LCC: largest connected component, C: centrality. Rubinov and 
Sporns (2010) provides a definition for all these properties and discusses their 
usefulness in interpreting brain networks. 



Citation Phenotype Neuro- Network 

imaging Properties 
Data 



Alexander-Bloch 


Childhood -onset 


rsfMRl 


LE, CC, M, SW, R 


etal. (2010) 


schizophrenia 






Yu et al. (2011a) 


Schizophrenia 


rsfMRl 


CS, CC, H 


Bassett etal. (2012) 


Schizophrenia 


rsfMRl 


CS, CV, LCC 


Alexander-Bloch 


Childhood -onset 


rsfMRl 


GE, CC SW, M, CD 


etal. (2013) 


schizophrenia 






Yuetal. (2011b) 


Schizophrenia 


rsfMRl 


D, CS, LE, GE, CPL, CC, SW 


Liu et al. (2013) 


Alzheimer's 


rsflvlRl 


CD, CC, GE 


Zhang etal. (2011) 


Major depressive 


rsfMRl 


SW, GE, C 




disorder 






Alexander-Bloch 


Childhood -onset 


rsfMRl 


M 


etal. (2012) 


schizophrenia 






Lynall et al. (2010) 


Schizophrenia 


rsfMRl 


CC, SW, R 


Wang etal. (2012) 


Amnestic mild 


rsfMRl 


CC, CPL, M, CS 




cognitive impairment 






Stam et al. (2009) 


Alzheimer's 


MEG 


CC, CPL, R 


Supekar et al. (2008) 


Alzheimer's 


rsfMRl 


CC, CPL 


Buckner et al. (2009) 


Alzheimer's 


rsfMRl 


H 


Chen etal. (2011) 


Aging 


MRI 


M 


Bassett etal. (2008) 


Schizophrenia 


MRI 


D, CPL, CC, SW 


Wuetal. (2012) 


Aging 


MRI 


SW,M 


Jalili and Knyazeva 


Schizophrenia 


EEG 


SW, R, M 


(2011) 








Cole et al. (2012) 


Cognitive control 


rsfMRl 


CS 




and intelligence 






Wuetal. (2012) 


Schizophrenia 


rsfMRl 


D, CS, CC, CPL, GE, LE 


Liu etal. (2008) 


Schizophrenia 


rsfMRl 


D, CC, CPL, GE, LE, SW 
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subject 1 



subject 2 



subject 3 



subject 4 



subject 5 



subject 6 



candidate subnetwork 

Fig. 4. Illustration of a subgraph discriminating between three healthy subjects and three 
disease subjects. The figure shows 6 networks from 3 healthy and 3 disease subjects. The 
shaded region in these networks covers nodes that are densely connected in healthy sub- 
jects and sparsely connected in disease subjects. Discovering such novel sets of nodes or 
subnetworks is essential. 



Kami et al., 2009; Vandin et al., 201 1 ; Wang et al., 201 1 ). The advantage 
of these approaches is that even when the strength of the strongly asso- 
ciated edges is not statistically significant, the subnetworks discovered 
can be statistically significant if they form a connected structure. A 
drawback of NBS approaches is that they cannot discover subnetworks 
when the individual edges are not associated with the phenotype, but 
when they are collectively associated. For example, consider a scenario 
where two edges (frontal-caudate and frontal-amygdala) connect 
three brain regions (e.g., caudate, amygdala, frontal lobe) that interac- 
tively accomplish a task (mood); each edge by itself cannot capture 
this synergy of all the three relevant regions and so the above men- 
tioned approach will not find the individual edges to be associated 
with the task and hence the synergistic system will be missed. 

A suite of network biomarker discovery techniques ( Chen et al., 201 2 ; 
Padmanabhan et al., 2012; Schmidt and Samatova, 2009; Schmidt et al., 
2012) proposed in genomic data analysis can be potentially used in the 
context of neuroimaging datasets. Schmidt and Samatova (2009) 
ct.Pj-motif finder algorithm is designed to discover cliques (a subgraph 
where every node is connected to every other node) in an underlying 
network that is significantly associated with a phenotype. In order to dis- 
cover general network biomarkers, beyond cliques, Padmanabhan et al. 
(2012) proposed an approach to find connected subgraph biomarkers. 
SPICE (System Phenotype-related Interplaying Components Enumera- 
tor) (Chen et al., 2012) was proposed to discover subgraphs that explain 
only subsets of subjects. These techniques can significantly improve the 
state-of-the-art in network biomarker discovery from brain networks. 

Simpler versions of network biomarkers have been used in analyz- 
ing neuroimaging data in the past, as shown in Table 1. However, the 
above discussed network biomarker approaches are yet to be applied 
to this data to discover complex variants of network biomarkers. 



functionality) of the entire system (network), i.e., removing those set of 
edges could affect the connectivity in one group of subjects more than 
the other group. These examples illustrate how network structure al- 
lows one to measure the impact of a selected set of nodes on the system 
(brain) as a whole and to understand the nature of connectivity within 
the subset of nodes to study the relationship between subgraph connec- 
tivity and the disease in question. 

The key difference between the above examples of network bio- 
markers and the pathway biomarkers is that pathway biomarkers 
work with known subnetworks and test for their hypothesis-driven as- 
sociation with disease, whereas network biomarkers find subnetworks 
that are associated with the disease in an unbiased manner. Thus, con- 
ceptually all pathway biomarkers can be considered to be a subtype of 
network biomarkers. The advantage of hypothesis-driven focused path- 
way biomarker analyses is that the findings are bound to comply with 
well-studied subnetworks in the brain, are easy to interpret, and less 
subject to spurious findings. The disadvantage of using pathway bio- 
markers is that with this approach it may be impossible to find novel 
subnetworks that are hidden in the data but truly associated with a dis- 
ease. Further, many of our a priori assumptions may be wrong and in 
that case, searching only for known pathways may result in a confirma- 
tion bias and limit our ability to find true causes of disease. Network bio- 
markers are appropriate in these scenarios, where a global unbiased 
search of the data is required. However, they are computationally more 
intensive given the size of the search space of all possible subnetworks. 

One approach to derive subgraphs in the brain network whose dys- 
function resulted in the manifestation of a phenotype is to first find the 
edges that are associated with the phenotype individually; construct a 
network of these associated edges, and discover significantly densely 
connected regions in this network. Zalesky et al's Network Based Statis- 
tic (NBS) approach (Zalesky et al., 2010) works in a similar fashion and 
it discovers the largest connected component in the network of signifi- 
cantly associated edges. Similar approaches have also been employed in 
genomic case-control data analysis to identify protein networks that 
are associated in cancer (Chuang et al., 2007; Ideker and Sharan, 2008; 



6. Concluding remarks 

In this manuscript we considered the nature of complex biomarkers 
being investigated in the recent literature and presented techniques 
that are designed in related areas of data mining, statistics, machine 
learning and bioinformatics. Most of the techniques presented here 
have been refined for over a decade since their inception and so they 
can be directly applied to study the hypotheses being considered in neu- 
roimaging studies. Thus there is significant potential for advancing the 
state of the art in complex biomarker discovery for neuroimaging data. 

Specifically, the current state of the art provides neuroimaging based 
biomarkers that are typically based on single features and are good indi- 
cators of the mental disorder after the disorder has begun for some 
disorders, e.g., AD (Linden, 2012). However, complex neuroimaging bio- 
markers hold out the promise helping predict high risk subjects before a 
disease, can also better help understand the differences between vari- 
ous mental disorders, e.g., schizophrenia and bipolar, and can provide 
insights where several subgroups exists, e.g., schizophrenia. The tech- 
niques covered in this manuscript have demonstrated their ability to 
find complex biomarkers in other domains and offer a new and promis- 
ing set of new tools for neuroimaging investigators. Indeed, since many 
of these techniques have already been applied to genetic and clinical 
data, there is a real possibility of finding complex biomarkers spanning 
multimodal data, thus further enhancing the breadth and depth of our 
understanding of neurological disorders. 
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